High dimensional nearest neighbor searching
نویسندگان
چکیده
As databases increasingly integrate different types of information such as time-series, multimedia and scientific data, it becomes necessary to support efficient retrieval of multi-dimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. As a result of the scale and high dimensional nature, the traditional techniques have proven inadequate. In this paper, we propose search techniques that are effective especially for large high dimensional data sets. We first propose VAþ-file technique which is based on scalar quantization of the data. VAþ-file is especially useful for searching exact nearest neighbors (NN) in non-uniform high dimensional data sets. We then discuss how to improve the search and make it progressive by allowing some approximations in the query result. We develop a general framework for approximate NN queries, discuss various approaches for progressive processing of similarity queries, and develop a metric for evaluation of such techniques. Finally, a new technique based on clustering is proposed, which merges the benefits of various approaches for progressive similarity searching. Extensive experimental evaluation is performed on several real-life data sets. The evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching. The techniques proposed in this paper are effective for real-life data sets, which are typically non-uniform, and they are scalable with respect to both dimensionality and size of the data set. r 2005 Elsevier B.V. All rights reserved.
منابع مشابه
A Parallel Algorithms on Nearest Neighbor Search
The (k-)nearest neighbor searching has very high computational costs. The algorithms presented for nearest neighbor search in high dimensional spaces have have suffered from curse of dimensionality, which affects either runtime or storage requirements of the algorithms terribly. Parallelization of nearest neighbor search is a suitable solution for decreasing the workload caused by nearest neigh...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملThe Analysis of a Probabilistic Approach to Nearest Neighbor Searching
Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has ap...
متن کاملNearest-Neighbor Searching and Metric Space Dimensions
Given a set S of points in a metric space with distance function D, the nearest-neighbor searching problem is to build a data structure for S so that for an input query point q, the point s ∈ S that minimizes D(s, q) can be found quickly. We survey approaches to this problem, and its relation to concepts of metric space dimension. Several measures of dimension can be estimated using nearest-nei...
متن کاملAn efficient nearest neighbor search in high-dimensional data spaces
Similarity search in multimedia databases requires an efficient support of nearest neighbor search on a large set of high-dimensional points. A technique applied for similarity search in multimedia databases is to transform important properties of the multimedia objects into points of a high-dimensional feature space. The feature space is usually indexed using a multidimensional index structure...
متن کاملSolving approximate similarity queries
As we know, both nearest neighbor and range searching problems are among the most important and fundamental problems in computational geometry because of its numerous important application areas [1, 2]. Specially, in many modern database applications, high-dimensional searching problems arise when complex objects are represented by vectors of d numeric features. As the dimension d increases hig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Syst.
دوره 31 شماره
صفحات -
تاریخ انتشار 2006